해당 강의노트는 전북대학교 김광수교수님 2023-2 고급딥러닝 자료임

제출기한~11/10

1

Taylor series expansion and gradient: 2nd order Taylor series expansion

\[f(x) \approx f(x_0) + (x-x_0)^T \nabla f(x_0) + \dfrac{1}{2}(x-x_0)^T H(x_0)(x-x_0), x \in \mathbb{R}^d\]

when we move \(x_0\) to \(x_0 − ϵ ⊙ ∇f(x_0)\) for \(x\) what’s the optimal \(ϵ\) for the minimization of \(f\) in this approximation?

2

Back Prop.: Considering the following architecture.

Input layer: \((x_1, x_2)\)

1st layer: \(l^{(1)}_k = \text{max} \{ 0, \sum_{j=1}^2 w^{(1)}_{kj} x_j + b^{(1)}_k \}, k = 1, \dots , s_1\)

2st layer: \(l^{(2)}_k = \text{max} \{ 0, \sum_{j=1}^{s_1} w^{(2)}_{kj} l_j^{(1)} + b^{(2)}_k \}, k = 1, \dots , s_2\)

3st layer: \(l^{(3)}_k = \text{max} \{ 0, \sum_{j=1}^{s_2} w^{(3)}_{kj} l_j^{(2)} + b^{(3)}_k \}, k = 1, \dots , s_3\)

4st layer: \(l^{(4)}_k = \text{max} \{ 0, \sum_{j=1}^{s_3} w^{(4)}_{kj} l_j^{(3)} + b^{(4)}_k \}, k = 1, \dots , s_4\)

ouput latyer: \(f = \sum_{j=1}^{s_4} w_j l_j^{(4)} + b\)

Loss function: \(ℓ(y,f)\)

(a)

Calculate \(\dfrac{∂ℓ}{∂w^{(l)}_{kj}}\) for \((l, k, j) ∈ \{4, 1\} × \{1 \} × \{1, 2\}\)

(b)

Assume that we have two mini-batches such that \(\{x = (2, 3), y = 6\}\) and \(\{x = (1, 4), y = 7\}\), and \(ℓ = (y − f)^2\) . Also, we initialize all values of \(w^{(l)}_{kj} , w_j , b^{(l)}\) and \(b\) as \(1/10\). Consider the updating rule of \(w_{t+1} = w_t − ϵ∇_wℓ_t\).

(c)

Calculate the updated value of \(w^{(2)}_{1,2}\) at the first step only using the first batch.

3

Complexity of DNN in experiments: The attached codes are for the regression (mpg dataset on colab). The aim is to predict the mpg by various variables. The final results are MSE and scatter plot between the prediction and variable of weight. Report the results of MSE and this scatter plot when the number of layers increases or decreases or the activation function is changed to linear, and make some comments. Model architecture can be modified in the lines of def build model():.